49 research outputs found

    Desiderata for the development of next-generation electronic health record phenotype libraries

    Get PDF
    Background High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. Methods A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. Results We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. Conclusions There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains

    Optimising medication data collection in a large-scale clinical trial

    Get PDF
    © 2019 Lockery et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Objective: Pharmaceuticals play an important role in clinical care. However, in community-based research, medication data are commonly collected as unstructured free-text, which is prohibitively expensive to code for large-scale studies. The ASPirin in Reducing Events in the Elderly (ASPREE) study developed a two-pronged framework to collect structured medication data for 19,114 individuals. ASPREE provides an opportunity to determine whether medication data can be cost-effectively collected and coded, en masse from the community using this framework. Methods: The ASPREE framework of type-to-search box with automated coding and linked free text entry was compared to traditional method of free-text only collection and post hoc coding. Reported medications were classified according to their method of collection and analysed by Anatomical Therapeutic Chemical (ATC) group. Relative cost of collecting medications was determined by calculating the time required for database set up and medication coding. Results Overall, 122,910 participant structured medication reports were entered using the type-tosearch box and 5,983 were entered as free-text. Free-text data contributed 211 unique medications not present in the type-to-search box. Spelling errors and unnecessary provision of additional information were among the top reasons why medications were reported as freetext. The cost per medication using the ASPREE method was approximately USD 0.03comparedwithUSD0.03 compared with USD 0.20 per medication for the traditional method. Conclusion Implementation of this two-pronged framework is a cost-effective alternative to free-text only data collection in community-based research. Higher initial set-up costs of this combined method are justified by long term cost effectiveness and the scientific potential for analysis and discovery gained through collection of detailed, structured medication data

    Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

    Get PDF
    Background Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step. Methods We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network. Results The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N − 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem. Conclusions The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians

    Biomedical informatics and translational medicine

    Get PDF
    Biomedical informatics involves a core set of methodologies that can provide a foundation for crossing the "translational barriers" associated with translational medicine. To this end, the fundamental aspects of biomedical informatics (e.g., bioinformatics, imaging informatics, clinical informatics, and public health informatics) may be essential in helping improve the ability to bring basic research findings to the bedside, evaluate the efficacy of interventions across communities, and enable the assessment of the eventual impact of translational medicine innovations on health policies. Here, a brief description is provided for a selection of key biomedical informatics topics (Decision Support, Natural Language Processing, Standards, Information Retrieval, and Electronic Health Records) and their relevance to translational medicine. Based on contributions and advancements in each of these topic areas, the article proposes that biomedical informatics practitioners ("biomedical informaticians") can be essential members of translational medicine teams
    corecore